{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": { "redirect_from": ["https://github.com/azure/azure-search-vector-samples/blob/main/demo-python/code/e2e-demos/azure-ai-search-LM-custom-skill-demo.ipynb"] },
   "source": [
    "# Azure AI Search with Custom LLM Skills Demo\n",
    "\n",
    "In this notebook, we'll demonstrate how to leverage **Azure AI Studio** models such as OpenAI's GPT-4o-mini and Microsoft's Phi35-vision to enhance your Azure AI Search capabilities. By integrating these models, you'll achieve higher index quality, leverage more data for filters and language model assistance, enable multimodal Retrieval-Augmented Generation (RAG), and enjoy the flexibility of choosing from multiple models available in the **Azure AI Studio Model Catalog**.\n",
    "\n",
    "**This unofficial code sample was created for an Ignite 2024 demo. It's offered \"as-is\" and might not work for all customers and all scenarios.**\n",
    "\n",
    "## Benefits\n",
    "- **Higher Index Quality**: Achieve more accurate and comprehensive search indexes.\n",
    "- **Enhanced Data Utilization**: Leverage additional data to improve filters and assist language models.\n",
    "- **Multimodal RAG**: Enable Retrieval-Augmented Generation across different data types and modalities.\n",
    "- **Model Flexibility**: Choose from a variety of models in the **Azure AI Studio Model Catalog** to suit your specific needs.\n",
    "\n",
    "## Prerequisites\n",
    "- 🐍 Python 3.9 or higher\n",
    "- 🔗 [Azure AI Search Service](https://learn.microsoft.com/azure/search/)\n",
    "- 🔗 [Azure AI Inference API](https://learn.microsoft.com/azure/ai-studio/ai-services/model-inference)\n",
    "- 🔗 [Azure AI Studio Model Catalog](https://learn.microsoft.com/azure/ai-studio/how-to/model-catalog-overview)\n",
    "- 🔗 [Phi35-Vision](https://github.com/microsoft/Phi-3CookBook/blob/main/md/01.Introduce/Phi3Family.md)\n",
    "- 🔗 [Azure OpenAI Text Embeddings](https://learn.microsoft.com/azure/search/cognitive-search-skill-azure-openai-embedding)\n",
    "- 🔗 [Azure AI Search Power Skills Repo](https://github.com/Azure-Samples/power-skills)\n",
    "\n",
    "\n",
    "## Features Covered\n",
    "- ✅ Blob Storage Data Source\n",
    "- ✅ Azure OpenAI Text Embeddings\n",
    "- ✅ Integrated Vectorization in Azure AI Search\n",
    "- ✅ Deploying a Custom Azure Function Skill\n",
    "- ✅ Multimodal Retrieval-Augmented Generation (RAG)\n",
    "- ✅ Flexible Model Selection from Azure AI Studio Model Catalog\n",
    "\n",
    "## Scenarios Demonstrated\n",
    "1. **Image Captioning**: Generate descriptive captions for images.\n",
    "2. **Document Summarization**: Create concise summaries of lengthy documents.\n",
    "3. **Entity Extraction**: Extract key entities from documents using custom skills for index augmentation and enrichment.\n",
    "\n",
    "Let's get started!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install required libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install --quiet -r azure-ai-search-LM-custom-skill-requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Azure Function App Deployment Instructions\n",
    "\n",
    "Go to https://github.com/Azure-Samples/azure-search-power-skills and select either [AzureAIStudioCustomInferenceSkill](https://github.com/Azure-Samples/azure-search-power-skills/tree/main/AzureAIStudioCustomInferenceSkill) or [AzureOpenAICustomInferenceSkill](https://github.com/Azure-Samples/azure-search-power-skills/tree/main/AzureAIStudioCustomInferenceSkill) (or both if you want to leverage multiple language models)\n",
    "\n",
    "## Prerequisites\n",
    "- Azure Functions Core Tools installed\n",
    "- Azure CLI installed and authenticated\n",
    "- Local function app code ready for deployment\n",
    "\n",
    "## Creating the Function Apps\n",
    "\n",
    "Once you have forked the Power Skills Repo, select the Power Skill you want to deploy. Follow the README.MD for the power skill to get it up and running locally. Then, create your function apps using Azure CLI:\n",
    "\n",
    "```bash\n",
    "az functionapp create \\\n",
    "    --resource-group <RESOURCE_GROUP> \\\n",
    "    --consumption-plan-location <LOCATION> \\\n",
    "    --runtime python \\\n",
    "    --runtime-version 3.11 \\\n",
    "    --functions-version 4 \\\n",
    "    --name <FUNCTION_NAME_OPENAI> \\\n",
    "    --os-type linux \\\n",
    "    --storage-account <STORAGE_ACCOUNT>\n",
    "```\n",
    "\n",
    "Replace the following placeholders with your values:\n",
    "- `<RESOURCE_GROUP>`: Your Azure resource group name\n",
    "- `<LOCATION>`: Azure region (e.g., eastus, westeurope)\n",
    "- `<FUNCTION_NAME_OPENAI>`: Name for your new function app\n",
    "- `<STORAGE_ACCOUNT>`: Storage account name to be used by the function app\n",
    "\n",
    "## Deploying the Function Apps\n",
    "\n",
    "Deploy your function apps using Azure Functions Core Tools:\n",
    "\n",
    "```bash\n",
    "# Deploy AI Studio Model as a Service Function App\n",
    "func azure functionapp publish <FUNCTION_APP_NAME_AISTUDIO> --publish-local-settings \n",
    "\n",
    "# Deploy Azure OpenAI Function App\n",
    "func azure functionapp publish <FUNCTION_APP_NAME_OPENAI> --publish-local-settings\n",
    "```\n",
    "\n",
    "## Important Notes\n",
    "- Run each deployment command from the respective function app's root directory\n",
    "- The `--publish-local-settings` flag will upload your local.settings.json configuration\n",
    "- Make sure your local.settings.json contains all necessary application settings\n",
    "- Verify your deployments in the Azure Portal after completion\n",
    "\n",
    "## Verification Steps\n",
    "1. Check the Azure Portal to ensure both function apps are running\n",
    "2. Monitor the function logs for any deployment issues\n",
    "3. Test the endpoints to verify functionality\n",
    "4. Review application settings to ensure all configurations were properly uploaded\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from dotenv import load_dotenv\n",
    "from azure.core.credentials import AzureKeyCredential\n",
    "from azure.identity import DefaultAzureCredential\n",
    "from azure.search.documents import SearchClient\n",
    "from azure.search.documents.indexes import SearchIndexClient, SearchIndexerClient\n",
    "from azure.search.documents.indexes.models import (\n",
    "    AzureOpenAIEmbeddingSkill,\n",
    "    AzureOpenAIModelName,\n",
    "    AzureOpenAIVectorizer,\n",
    "    AzureOpenAIVectorizerParameters,\n",
    "    FieldMapping,\n",
    "    HnswAlgorithmConfiguration,\n",
    "    HnswParameters,\n",
    "    IndexerExecutionStatus,\n",
    "    InputFieldMappingEntry,\n",
    "    OutputFieldMappingEntry,\n",
    "    SearchField,\n",
    "    SearchableField,\n",
    "    SearchFieldDataType,\n",
    "    SearchIndex,\n",
    "    SearchIndexer,\n",
    "    SearchIndexerDataContainer,\n",
    "    SearchIndexerDataSourceConnection,\n",
    "    SearchIndexerIndexProjectionSelector,\n",
    "    SearchIndexerIndexProjectionsParameters,\n",
    "    SearchIndexerSkillset,\n",
    "    SimpleField,\n",
    "    SplitSkill,\n",
    "    VectorSearch,\n",
    "    VectorSearchAlgorithmMetric,\n",
    "    VectorSearchProfile,\n",
    ")\n",
    "from azure.search.documents.models import (\n",
    "    VectorizableTextQuery,\n",
    ")\n",
    "from azure.search.documents.indexes.models import (\n",
    "    IndexingParameters,\n",
    "    IndexingParametersConfiguration,\n",
    "    SearchIndexerDataSourceConnection,\n",
    "    SearchIndexerDataContainer,\n",
    "    SearchIndexerDataSourceType\n",
    ")\n",
    "from azure.search.documents.indexes import SearchIndexerClient\n",
    "from azure.core.credentials import AzureKeyCredential\n",
    "\n",
    "from azure.search.documents.indexes import SearchIndexerClient\n",
    "from azure.search.documents.indexes.models import (\n",
    "    SearchIndexerSkillset, WebApiSkill, AzureMachineLearningSkill,\n",
    "    SplitSkill, MergeSkill, InputFieldMappingEntry, OutputFieldMappingEntry,\n",
    "    SearchIndexerIndexProjectionSelector,\n",
    "    SearchIndexerIndexProjection,\n",
    "    SearchIndexerIndexProjectionsParameters, IndexProjectionMode\n",
    ")\n",
    "from azure.identity import DefaultAzureCredential\n",
    "from dotenv import load_dotenv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load environment variables\n",
    "load_dotenv()\n",
    "\n",
    "# Azure AI Studio PHI Configuration\n",
    "AZURE_AI_STUDIO_PHI_3_API_KEY = os.getenv(\"AZURE_AI_STUDIO_PHI_3_API_KEY\")\n",
    "AZURE_AI_STUDIO_PHI_3_ENDPOINT = os.getenv(\"AZURE_AI_STUDIO_PHI_3_ENDPOINT\")\n",
    "AZURE_OPENAI_ENDPOINT = os.getenv(\"AZURE_OPENAI_ENDPOINT\")\n",
    "AZURE_OPENAI_EMBED_ENDPOINT=os.getenv(\"AZURE_OPENAI_EMBED_ENDPOINT\")\n",
    "AZURE_OPENAI_EMBED_API_KEY=os.getenv(\"AZURE_OPENAI_EMBED_API_KEY\")\n",
    "\n",
    "# Index Names\n",
    "INDEX_NAME=\"phi35-enriched-content\"\n",
    "\n",
    "# Azure Search Service Configuration\n",
    "SEARCH_SERVICE_API_KEY = os.getenv(\"AZURE_SEARCH_ADMIN_KEY\")\n",
    "SEARCH_SERVICE_ENDPOINT = os.getenv(\"AZURE_SEARCH_SERVICE_ENDPOINT\")\n",
    "\n",
    "# Blob Storage Configuration\n",
    "BLOB_CONNECTION_STRING = os.getenv(\"BLOB_CONNECTION_STRING\")\n",
    "BLOB_STORAGE_ACCOUNT_KEY = os.getenv(\"BLOB_STORAGE_ACCOUNT_KEY\")\n",
    "BLOB_CONTAINER_NAME = \"mini-contoso\"\n",
    "\n",
    "# Full Custom Function App URL\n",
    "CUSTOM_PHI3_FUNCTION_BASE_URL=\"YOUR-CUSTOM-PHI3-FUNCTION-APP-URL\"\n",
    "CUSTOM_AOAI_FUNCTION_BASE_URL=\"YOUR-CUSTOM-AOAI-FUNCTION-APP-URL\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using API keys for authentication.\n"
     ]
    }
   ],
   "source": [
    "# User-specified parameter\n",
    "USE_AAD_FOR_SEARCH = False  \n",
    "\n",
    "def authenticate_azure_search(api_key=None, use_aad_for_search=False):\n",
    "    if use_aad_for_search:\n",
    "        print(\"Using AAD for authentication.\")\n",
    "        credential = DefaultAzureCredential()\n",
    "    else:\n",
    "        print(\"Using API keys for authentication.\")\n",
    "        if api_key is None:\n",
    "            raise ValueError(\"API key must be provided if not using AAD for authentication.\")\n",
    "        credential = AzureKeyCredential(api_key)\n",
    "    return credential\n",
    "\n",
    "azure_search_credential = authenticate_azure_search(api_key=SEARCH_SERVICE_API_KEY, use_aad_for_search=USE_AAD_FOR_SEARCH)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data source 'phi35-enriched-content-blob' created or updated\n"
     ]
    }
   ],
   "source": [
    "# Initialize the SearchIndexerClient with a credential\n",
    "indexer_client = SearchIndexerClient(SEARCH_SERVICE_ENDPOINT, azure_search_credential)\n",
    "\n",
    "# Create or update a data source connection\n",
    "container = SearchIndexerDataContainer(name=BLOB_CONTAINER_NAME)\n",
    "data_source_connection = SearchIndexerDataSourceConnection(\n",
    "    name=f\"{INDEX_NAME}-blob\",\n",
    "    type=SearchIndexerDataSourceType.AZURE_BLOB,\n",
    "    connection_string=BLOB_CONNECTION_STRING,\n",
    "    container=container,\n",
    ")\n",
    "data_source = indexer_client.create_or_update_data_source_connection(data_source_connection)\n",
    "\n",
    "print(f\"Data source '{data_source.name}' created or updated\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize the SearchIndexClient\n",
    "index_client = SearchIndexClient(\n",
    "    endpoint=SEARCH_SERVICE_ENDPOINT,\n",
    "    credential=azure_search_credential,\n",
    ")\n",
    "\n",
    "# Define the fields to match the index.json schema\n",
    "fields = [\n",
    "    SearchableField(\n",
    "        name=\"chunk_id\",\n",
    "        type=SearchFieldDataType.String,\n",
    "        key=True,\n",
    "        sortable=True,\n",
    "        analyzer_name=\"keyword\",\n",
    "        filterable=True,\n",
    "    ),\n",
    "    SimpleField(\n",
    "        name=\"parent_id\",\n",
    "        type=SearchFieldDataType.String,\n",
    "        filterable=True,\n",
    "        analyzer=\"standard.lucene\",\n",
    "    ),\n",
    "    SearchableField(\n",
    "        name=\"chunk\", type=SearchFieldDataType.String, analyzer_name=\"standard.lucene\"\n",
    "    ),\n",
    "    SearchableField(\n",
    "        name=\"parent_summary\",\n",
    "        type=SearchFieldDataType.String,\n",
    "        analyzer_name=\"standard.lucene\",\n",
    "    ),\n",
    "    SearchableField(\n",
    "        name=\"entities\",\n",
    "        collection=True,\n",
    "        type=SearchFieldDataType.String,\n",
    "        facetable=True,\n",
    "        analyzer_name=\"standard.lucene\",\n",
    "    ),\n",
    "    SearchableField(\n",
    "        name=\"title\", type=SearchFieldDataType.String, analyzer_name=\"standard.lucene\"\n",
    "    ),\n",
    "    SimpleField(\n",
    "        name=\"metadata_storage_path\",\n",
    "        type=SearchFieldDataType.String,\n",
    "        filterable=True,\n",
    "        facetable=True,\n",
    "    ),\n",
    "    SearchField(\n",
    "        name=\"text_embedding\",\n",
    "        type=SearchFieldDataType.Collection(SearchFieldDataType.Single),\n",
    "        vector_search_dimensions=1536,\n",
    "        vector_search_profile_name=\"my-vector-profile\",\n",
    "        hidden=False,\n",
    "    ),\n",
    "]\n",
    "\n",
    "# Define the vector search configuration\n",
    "vector_search = VectorSearch(\n",
    "    profiles=[\n",
    "        VectorSearchProfile(\n",
    "            name=\"my-vector-profile\",\n",
    "            algorithm_configuration_name=\"my-vector-config\",\n",
    "            vectorizer_name=\"my-vectorizer\",\n",
    "        )\n",
    "    ],\n",
    "    algorithms=[\n",
    "        HnswAlgorithmConfiguration(\n",
    "            name=\"my-vector-config\",\n",
    "            kind=\"hnsw\",\n",
    "            parameters=HnswParameters(metric=VectorSearchAlgorithmMetric.COSINE),\n",
    "        )\n",
    "    ],\n",
    "    vectorizers=[\n",
    "        AzureOpenAIVectorizer(\n",
    "            vectorizer_name=\"my-vectorizer\",\n",
    "            parameters=AzureOpenAIVectorizerParameters(\n",
    "                resource_url=AZURE_OPENAI_ENDPOINT,\n",
    "                deployment_name=\"YOUR-EMBEDDING-DEPLOYMENT-NAME\",\n",
    "                api_key=AZURE_OPENAI_EMBED_API_KEY,\n",
    "                model_name=AzureOpenAIModelName.TEXT_EMBEDDING_3LARGE,\n",
    "            ),\n",
    "        )\n",
    "    ],\n",
    ")\n",
    "\n",
    "\n",
    "# Define the index\n",
    "index = SearchIndex(\n",
    "    name=f\"{INDEX_NAME}-index\",\n",
    "    fields=fields,\n",
    "    vector_search=vector_search,\n",
    ")\n",
    "\n",
    "# Create or update the index\n",
    "result = index_client.create_or_update_index(index)\n",
    "print(f\"{result.name} created\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize the SearchIndexerClient\n",
    "client = SearchIndexerClient(\n",
    "    endpoint=SEARCH_SERVICE_ENDPOINT,\n",
    "    credential=azure_search_credential,\n",
    ")\n",
    "\n",
    "def create_image_captioning_skill():\n",
    "    \"\"\"Custom skill for image captioning via a deployed Azure function.\"\"\"\n",
    "    return WebApiSkill(\n",
    "        name=\"Image Captioning Custom Skill\",\n",
    "        description=\"Generates captions for images using a custom API\",\n",
    "        context=\"/document/normalized_images/*\",\n",
    "        uri=f\"{CUSTOM_PHI3_FUNCTION_BASE_URL}/api/custom_skill\",\n",
    "        http_method=\"POST\",\n",
    "        timeout=\"PT1M\",\n",
    "        batch_size=2,\n",
    "        http_headers={\"scenario\": \"image-captioning\"},\n",
    "        inputs=[InputFieldMappingEntry(name=\"image\", source=\"/document/normalized_images/*\")],\n",
    "        outputs=[OutputFieldMappingEntry(name=\"generative-caption\", target_name=\"caption\")]\n",
    "    )\n",
    "\n",
    "def create_merge_skill():\n",
    "    \"\"\"Text merge skill provided by Microsoft to merge content.\"\"\"\n",
    "    return MergeSkill(\n",
    "        name=\"MSFT's Native Text Merge Skill\",\n",
    "        description=\"Merges text content and captions\",\n",
    "        context=\"/document\",\n",
    "        inputs=[\n",
    "            InputFieldMappingEntry(name=\"text\", source=\"/document/content\"),\n",
    "            InputFieldMappingEntry(name=\"itemsToInsert\", source=\"/document/normalized_images/*/caption\"),\n",
    "            InputFieldMappingEntry(name=\"offsets\", source=\"/document/normalized_images/*/contentOffset\")\n",
    "        ],\n",
    "        outputs=[OutputFieldMappingEntry(name=\"mergedText\", target_name=\"merged_content\")],\n",
    "        insert_pre_tag=\" \",\n",
    "        insert_post_tag=\" \"\n",
    "    )\n",
    "\n",
    "def create_text_summarization_skill():\n",
    "    \"\"\"Custom skill for text summarization using a custom API.\"\"\"\n",
    "    return WebApiSkill(\n",
    "        name=\"Text Summarization Custom Skill\",\n",
    "        description=\"Summarizes merged content using a custom model\",\n",
    "        context=\"/document/merged_content\",\n",
    "        uri=f\"{CUSTOM_PHI3_FUNCTION_BASE_URL}/api/custom_skill\",\n",
    "        http_method=\"POST\",\n",
    "        timeout=\"PT1M\",\n",
    "        batch_size=2,\n",
    "        http_headers={\"scenario\": \"summarization\"},\n",
    "        inputs=[InputFieldMappingEntry(name=\"text\", source=\"/document/merged_content\")],\n",
    "        outputs=[OutputFieldMappingEntry(name=\"generative-summary\", target_name=\"summary\")]\n",
    "    )\n",
    "\n",
    "def create_split_skill():\n",
    "    \"\"\"Skill to split merged text into chunks/pages.\"\"\"\n",
    "    return SplitSkill(\n",
    "        name=\"MSFT Text Split Skill\",\n",
    "        description=\"Splits text into pages\",\n",
    "        context=\"/document/merged_content\",\n",
    "        text_split_mode=\"pages\",\n",
    "        maximum_page_length=512,\n",
    "        page_overlap_length=20,\n",
    "        default_language_code=\"en\",\n",
    "        inputs=[InputFieldMappingEntry(name=\"text\", source=\"/document/merged_content\")],\n",
    "        outputs=[OutputFieldMappingEntry(name=\"textItems\", target_name=\"pages\")]\n",
    "    )\n",
    "\n",
    "def create_openai_embedding_skill():\n",
    "    \"\"\"Defines the embedding skill using Azure OpenAI for text embeddings.\"\"\"\n",
    "    return AzureOpenAIEmbeddingSkill(\n",
    "        name=\"AOAI Embedding Skill\",\n",
    "        description=\"Generates embeddings using Azure OpenAI\",\n",
    "        context=\"/document/merged_content/pages/*\",\n",
    "        resource_url=AZURE_OPENAI_ENDPOINT,\n",
    "        deployment_name=\"text-embedding-3-large\",\n",
    "        api_key=AZURE_OPENAI_EMBED_API_KEY,\n",
    "        model_name=AzureOpenAIModelName.TEXT_EMBEDDING_3LARGE,\n",
    "        inputs=[\n",
    "            InputFieldMappingEntry(name=\"text\", source=\"/document/merged_content/pages/*\")\n",
    "        ],\n",
    "        outputs=[OutputFieldMappingEntry(name=\"embedding\", target_name=\"text_embedding\")]\n",
    "    )\n",
    "\n",
    "def create_entity_extraction_skill():\n",
    "    \"\"\"Custom skill for entity extraction using a custom API.\"\"\"\n",
    "    return WebApiSkill(\n",
    "        name=\"Entity Extraction Custom Skill\",\n",
    "        description=\"Extracts entities using a custom model\",\n",
    "        context=\"/document/merged_content/pages/*\",\n",
    "        uri=f\"{CUSTOM_PHI3_FUNCTION_BASE_URL}/api/custom_skill\",\n",
    "        http_method=\"POST\",\n",
    "        timeout=\"PT1M\",\n",
    "        batch_size=2,\n",
    "        http_headers={\"scenario\": \"entity-recognition\"},\n",
    "        inputs=[InputFieldMappingEntry(name=\"text\", source=\"/document/merged_content/pages/*\")],\n",
    "        outputs=[OutputFieldMappingEntry(name=\"entities\", target_name=\"entities\")]\n",
    "    )\n",
    "\n",
    "# Create the skillset with all skills\n",
    "def create_skillset(client, skillset_name):\n",
    "    skillset = SearchIndexerSkillset(\n",
    "        name=skillset_name,\n",
    "        description=\"Skillset to chunk documents, use language models to enrich my index, and generate embeddings\",\n",
    "        skills=[\n",
    "            create_image_captioning_skill(),\n",
    "            create_merge_skill(),\n",
    "            create_text_summarization_skill(),\n",
    "            create_split_skill(),\n",
    "            create_openai_embedding_skill(),\n",
    "            create_entity_extraction_skill()\n",
    "        ],\n",
    "        index_projection=SearchIndexerIndexProjection(\n",
    "            selectors=[\n",
    "                SearchIndexerIndexProjectionSelector(\n",
    "                    target_index_name=f\"{INDEX_NAME}-index\",\n",
    "                    parent_key_field_name=\"parent_id\",\n",
    "                    source_context=\"/document/merged_content/pages/*\",\n",
    "                    mappings=[\n",
    "                        InputFieldMappingEntry(name=\"text_embedding\", source=\"/document/merged_content/pages/*/text_embedding\"),\n",
    "                        InputFieldMappingEntry(name=\"chunk\", source=\"/document/merged_content/pages/*\"),\n",
    "                        InputFieldMappingEntry(name=\"parent_summary\", source=\"/document/merged_content/summary\"),\n",
    "                        InputFieldMappingEntry(name=\"entities\", source=\"/document/merged_content/pages/*/entities\"),\n",
    "                        InputFieldMappingEntry(name=\"title\", source=\"/document/title\"),\n",
    "                        InputFieldMappingEntry(name=\"metadata_storage_path\",source=\"/document/metadata_storage_path\")\n",
    "                    ]\n",
    "                )\n",
    "            ],\n",
    "            parameters=SearchIndexerIndexProjectionsParameters(projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS)\n",
    "        )\n",
    "    )\n",
    "    try:\n",
    "        client.create_or_update_skillset(skillset)\n",
    "        print(f\"{skillset.name} created or updated successfully\")\n",
    "    except Exception as e:\n",
    "        print(f\"Failed to create or update skillset {skillset_name}: {e}\")\n",
    "\n",
    "# Usage example\n",
    "skillset_name = f\"{INDEX_NAME}-skillset\"\n",
    "create_skillset(client, skillset_name)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize the SearchIndexerClient\n",
    "indexer_client = SearchIndexerClient(\n",
    "    endpoint=SEARCH_SERVICE_ENDPOINT,\n",
    "    credential=azure_search_credential,\n",
    ")\n",
    "\n",
    "def create_and_run_indexer(\n",
    "    indexer_client, indexer_name, skillset_name, index_name, data_source_name\n",
    "):\n",
    "    \"\"\"\n",
    "    Creates and runs an indexer to index documents with embeddings.\n",
    "    \"\"\"\n",
    "    try:\n",
    "        # Define the indexer with necessary parameters\n",
    "        indexer = SearchIndexer(\n",
    "            name=indexer_name,\n",
    "            description=\"Indexer for Ignite Demo with OpenAI Embeddings\",\n",
    "            skillset_name=skillset_name,\n",
    "            target_index_name=index_name,\n",
    "            data_source_name=data_source_name,\n",
    "            parameters=IndexingParameters(\n",
    "                configuration=IndexingParametersConfiguration(\n",
    "                    data_to_extract=\"contentAndMetadata\",\n",
    "                    parsing_mode=\"default\",\n",
    "                    image_action=\"generateNormalizedImages\",\n",
    "                    query_timeout=None,\n",
    "                ),\n",
    "            ),\n",
    "            field_mappings=[\n",
    "                FieldMapping(\n",
    "                    source_field_name=\"metadata_storage_name\", target_field_name=\"title\"\n",
    "                ),\n",
    "                FieldMapping(\n",
    "                    source_field_name=\"metadata_storage_path\",\n",
    "                    target_field_name=\"metadata_storage_path\",\n",
    "                ),\n",
    "            ],\n",
    "        )\n",
    "\n",
    "        # Create or update the indexer\n",
    "        indexer_client.create_or_update_indexer(indexer)\n",
    "        print(f\"{indexer_name} created or updated successfully.\")\n",
    "\n",
    "        # Run the indexer\n",
    "        indexer_client.run_indexer(indexer_name)\n",
    "        print(f\"{indexer_name} is running. Please wait for indexing to complete.\")\n",
    "\n",
    "    except Exception as e:\n",
    "        print(f\"Failed to create or run indexer {indexer_name}: {e}\")\n",
    "\n",
    "\n",
    "# Main workflow\n",
    "data_source_name = f\"{INDEX_NAME}-blob\"\n",
    "indexer_name = f\"{INDEX_NAME}-indexer\"\n",
    "skillset_name = f\"{INDEX_NAME}-skillset\"\n",
    "\n",
    "create_and_run_indexer(\n",
    "    indexer_client, indexer_name, skillset_name, f\"{INDEX_NAME}-index\", data_source_name\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "indexer_last_result = indexer_client.get_indexer_status(indexer_name).last_result\n",
    "indexer_status = IndexerExecutionStatus.IN_PROGRESS if indexer_last_result is None  else indexer_last_result.status\n",
    "\n",
    "while(indexer_status == IndexerExecutionStatus.IN_PROGRESS):\n",
    "    indexer_last_result = indexer_client.get_indexer_status(indexer_name).last_result\n",
    "    indexer_status = IndexerExecutionStatus.IN_PROGRESS if indexer_last_result is None  else indexer_last_result.status\n",
    "    print(f\"Indexer '{indexer_name}' is still running. Current status: '{indexer_status}'.\")\n",
    "\n",
    "print(f\"Indexer '{indexer_name}' finished with status '{indexer_status}'.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Hybrid Search\n",
    "query = \"What are Contoso's goals for migrating to the cloud?\"  \n",
    "\n",
    "search_client = SearchClient(SEARCH_SERVICE_ENDPOINT, f\"{INDEX_NAME}-index\", azure_search_credential)\n",
    "vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=1, fields=\"text_embedding\")\n",
    "  \n",
    "results = search_client.search(  \n",
    "    search_text=query,  \n",
    "    vector_queries= [vector_query],\n",
    "    select=[\"title\", \"chunk\", \"metadata_storage_path\"],\n",
    "    top=1\n",
    ")  \n",
    "  \n",
    "for result in results:  \n",
    "    print(f\"Title: {result['title']}\")  \n",
    "    print(f\"Content: {result['chunk']}\")  \n",
    "    print(f\"Score: {result['@search.score']}\")  \n",
    "    print(f\"Path: {result['metadata_storage_path']}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
